8 February 2016

What happened?

  • Packages
  • Big business
  • Miscellaneous

Still growing

Package dominance

Estimate 95% CI
Median 0.6 0.6 to 0.6
Upper quartile 1.2 1.1 to 1.2
Top 5% 9.9 8.9 to 10.9
Top 1% 182 139 to 225

Biggest package trends

Major themes

  • web
  • graphics
  • package building blocks

xml2

Author: Hadley

library("xml2")
txt <- 
"<foo> 
  <bar> text <baz/> 
  </bar>
</foo>"
x <- read_xml(txt)
x
## {xml_document}
## <foo>
## [1] <bar> text <baz/> \n  </bar>
xml_children(x)
## {xml_nodeset (1)}
## [1] <bar> text <baz/> \n  </bar>
xml_find_all(x, "//baz")
## {xml_nodeset (1)}
## [1] <baz/>

rversions

Author: Csardi & Ooms

library(rversions)
r_release()
##    version                        date
## 97   3.2.3 2015-12-10T08:13:08.415370Z
r_versions() %>% 
  tail
##    version                        date
## 92   3.1.2 2014-10-31T08:11:32.082768Z
## 93   3.1.3 2015-03-09T08:12:20.229070Z
## 94   3.2.0 2015-04-16T07:13:33.144514Z
## 95   3.2.1 2015-06-18T07:15:04.589869Z
## 96   3.2.2 2015-08-14T07:13:18.272871Z
## 97   3.2.3 2015-12-10T08:13:08.415370Z

git2r

Author: Stefan Widgren

library(git2r)
repo <- init(".")
commits(repo) %>% 
  head(3)
## [[1]]
## [024ad5f] 2016-02-15: Added top charts
## 
## [[2]]
## [fc5fa42] 2016-02-14: Started working on R-trend presentation
## 
## [[3]]
## [d51d5dd] 2015-02-15: Added the try-catch approach

DiagrammeR

Author: Iannone, Sveidqvist, Bostock, Pettitt, Daines, Kashcha, Iannone

library(DiagrammeR)
DiagrammeR("
  graph LR
    A-->B
    A-->C
    C-->E
    B-->D
    C-->D
    D-->F
    E-->F
", height = 200, width = 400)

Diagrammer continued

DiagrammeR::mermaid("
gantt
    title A Gantt Diagram

    section Section
    A task           :a1, 2014-01-01, 30d
    Another task     :after a1  , 20d
    section Another
    Task in sec      :2014-01-12  , 12d
    anther task      : 24d
", width = 600, height = 250)

Top codeRs 2015

Top coders 2015
Coder Total ave. downloads per day No. of packages
1 Gabor Csardi 2,312 11
2 Stefan Widgren 1,563 1
3 RStudio 781 16
4 Hadley Wickham 695 12
5 Jeroen Ooms 541 10
6 Richard Cotton 501 22
7 R Foundation 490 1
8 David Hoerl 455 1
9 Sindre Sorhus 409 2
10 Richard Iannone 294 2

Gabor Csardi - Harvard (rversions) Jeroen Ooms - UCLA (xml2) Richard Cotton - Live Analytics (assertive)

Top codeRs 2010-2015

Top coders 2010-2015
Coder Total ave. downloads per day No. of packages
1 Hadley Wickham 32,115 55
2 Yihui Xie 9,739 18
3 RStudio 9,123 25
4 Jeroen Ooms 4,221 25
5 Justin Talbot 3,633 1
6 Winston Chang 3,531 17
7 Gabor Csardi 3,437 26
8 Romain Francois 2,934 20
9 Duncan Temple Lang 2,854 6
10 Adrian A. Dragulescu 2,456 2
11 JJ Allaire 2,453 7
12 Simon Urbanek 2,369 15
13 Dirk Eddelbuettel 2,094 33
14 Stefan Milton Bache 2,069 3
15 Douglas Bates 1,966 5
16 Renaud Gaujoux 1,962 6
17 Jelmer Ypma 1,933 2
18 Rob J Hyndman 1,933 3
19 Baptiste Auguie 1,924 2
20 Ulrich Halekoh Søren Højsgaard 1,764 1
21 Martin Maechler 1,682 11
22 Mirai Solutions GmbH 1,603 3
23 Stefan Widgren 1,563 1
24 Edwin de Jonge 1,513 10
25 Kurt Hornik 1,476 12
26 Deepayan Sarkar 1,369 4
27 Tyler Rinker 1,203 9
28 Yixuan Qiu 1,131 12
29 Revolution Analytics 1,011 4
30 Torsten Hothorn 948 7

The R-consortium

Chair: Hadley W.

From the about page:

  • Create infrastructure and standards to benefit all
  • Promote the R language in industry, academia, and government.
  • Create and promote best practices.

Members: Microsoft, RStudio, TIBC, Google, HP, Oracle, …

R-Hub

R-Hub will modernize and improve the entire process of developing and testing R packages

Goals

  1. Services that ease all steps the R package development process, creating a package, building binaries and continuous integration, publishing, distributing and maintaining it.
  2. Make these services free for all members of the community.
  3. Allow community contributions to r-hub itself.
  4. Make CRAN maintainers' work easier by pre-testing CRAN package submissions.

My own experience

  • dplyr changed my life
  • dear magrittr, I just can't live without you
  • flexsurv I can actually explain to my colleagues…

SQL with dplyr

library(dplyr)
library(magrittr)
data_a <- 
  data.frame(id = 1:3,
             var1 = LETTERS[1:3])
data_b <- 
  data.frame(id = 1:3+2,
             var2 = LETTERS[1:3+2])

data_a %>% 
  left_join(data_b) %>% 
  knitr::kable()
id var1 var2
1 A NA
2 B NA
3 C C
data_a %>% 
  inner_join(data_b) %>% 
  knitr::kable()
id var1 var2
3 C C

More dplyr

data_a %>% 
  right_join(data_b) %>% 
  knitr::kable()
id var1 var2
3 C C
4 NA D
5 NA E

data_a %>% 
  anti_join(data_b) %>% 
  knitr::kable()
id var1
2 B
1 A

Filter with dplyr

set.seed(98213)
data_c <- 
  data.frame(id = c(1,1,2,2,3),
             var1 = runif(5))
knitr::kable(data_c)
id var1
1 0.9736481
1 0.6692199
2 0.7389001
2 0.7678625
3 0.1859028
data_c %>% 
  group_by(id) %>% 
  filter(var1 == min(var1)) %>% 
  knitr::kable()
id var1
1 0.6692199
2 0.7389001
3 0.1859028

Do with dplyr

# The do allows us to do custom operations
data_c %>% 
  group_by(id) %>% 
  do({
    if(nrow(.) == 1)
      return(.)
    .$var1 = min(.$var1)+max(.$var1)^2
    return(.[1,])
  }) %>% 
  knitr::kable()
id var1
1 1.6172106
2 1.3285129
3 0.1859028
library(multidplyr)
data_d <- 
  data.frame(id = sample(1:10, size = 10^4, replace = TRUE),
             var1 = runif(10^4))
data_d %>% 
  partition(id) %>% 
  do({
    if(nrow(.) == 1)
      return(.)
    .$var1 = min(.$var1)+max(.$var1)^2
    return(.[1,])
  }) %>% 
  collect() %>% 
  arrange(id) %>% 
  tail(3) %>% 
  knitr::kable()
id var1
8 1.0003005
9 1.0020283
10 0.9981546